We start with an exploratory data analysis and preprocessing.

hirs <- read.table("hirsutism.dat", header = T, sep = "\t", fill = TRUE)

summary(hirs)
##    Treatment          FGm0            FGm3             FGm6       
##  Min.   :0.000   Min.   :14.57   Min.   : 4.381   Min.   : 1.786  
##  1st Qu.:1.000   1st Qu.:16.23   1st Qu.: 9.557   1st Qu.: 7.202  
##  Median :2.000   Median :17.65   Median :12.643   Median :10.286  
##  Mean   :1.535   Mean   :18.57   Mean   :13.084   Mean   :10.853  
##  3rd Qu.:3.000   3rd Qu.:20.17   3rd Qu.:16.219   3rd Qu.:14.204  
##  Max.   :3.000   Max.   :28.36   Max.   :25.637   Max.   :23.411  
##                                                                   
##      FGm12           SysPres         DiaPres          weight      
##  Min.   :-1.163   Min.   : 88.0   Min.   :46.00   Min.   : 41.00  
##  1st Qu.: 5.093   1st Qu.:110.0   1st Qu.:65.00   1st Qu.: 57.00  
##  Median : 7.524   Median :115.0   Median :70.00   Median : 64.00  
##  Mean   : 8.911   Mean   :115.9   Mean   :70.04   Mean   : 68.06  
##  3rd Qu.:12.101   3rd Qu.:120.0   3rd Qu.:75.00   3rd Qu.: 74.50  
##  Max.   :22.759   Max.   :162.0   Max.   :95.00   Max.   :113.00  
##                   NA's   :8       NA's   :8       NA's   :8       
##      height     
##  Min.   :1.480  
##  1st Qu.:1.580  
##  Median :1.610  
##  Mean   :1.613  
##  3rd Qu.:1.650  
##  Max.   :1.800  
##  NA's   :8
hirs$Treatment <- as.factor(hirs$Treatment)

summary(hirs)
##  Treatment      FGm0            FGm3             FGm6            FGm12       
##  0:23      Min.   :14.57   Min.   : 4.381   Min.   : 1.786   Min.   :-1.163  
##  1:26      1st Qu.:16.23   1st Qu.: 9.557   1st Qu.: 7.202   1st Qu.: 5.093  
##  2:24      Median :17.65   Median :12.643   Median :10.286   Median : 7.524  
##  3:26      Mean   :18.57   Mean   :13.084   Mean   :10.853   Mean   : 8.911  
##            3rd Qu.:20.17   3rd Qu.:16.219   3rd Qu.:14.204   3rd Qu.:12.101  
##            Max.   :28.36   Max.   :25.637   Max.   :23.411   Max.   :22.759  
##                                                                              
##     SysPres         DiaPres          weight           height     
##  Min.   : 88.0   Min.   :46.00   Min.   : 41.00   Min.   :1.480  
##  1st Qu.:110.0   1st Qu.:65.00   1st Qu.: 57.00   1st Qu.:1.580  
##  Median :115.0   Median :70.00   Median : 64.00   Median :1.610  
##  Mean   :115.9   Mean   :70.04   Mean   : 68.06   Mean   :1.613  
##  3rd Qu.:120.0   3rd Qu.:75.00   3rd Qu.: 74.50   3rd Qu.:1.650  
##  Max.   :162.0   Max.   :95.00   Max.   :113.00   Max.   :1.800  
##  NA's   :8       NA's   :8       NA's   :8        NA's   :8
attach(hirs)

boxplot(hirs[, 2:5])

par(mfrow = c(2, 2))
boxplot(hirs[, 2] ~ Treatment, ylim = c(0, 30), main = names(hirs)[2], xlab = "Treatment")
boxplot(hirs[, 3] ~ Treatment, ylim = c(0, 30), main = names(hirs)[3], xlab = "Treatment")
boxplot(hirs[, 4] ~ Treatment, ylim = c(0, 30), main = names(hirs)[4], xlab = "Treatment")
boxplot(hirs[, 5] ~ Treatment, ylim = c(0, 30), main = names(hirs)[5], xlab = "Treatment")

par(mfrow = c(1, 1))

par(mfrow = c(2, 2))
boxplot(hirs[Treatment == 0, 2:5], ylim = c(0, 30), main = "Treatment 0")
boxplot(hirs[Treatment == 1, 2:5], ylim = c(0, 30), main = "Treatment 1")
boxplot(hirs[Treatment == 2, 2:5], ylim = c(0, 30), main = "Treatment 2")
boxplot(hirs[Treatment == 3, 2:5], ylim = c(0, 30), main = "Treatment 3")

par(mfrow = c(1, 1))

# Remove observations with missing data
hirs <- na.omit(hirs)

# Pacient 84 has FGm12 < 0, so we set it 0
hirs[hirs$FGm12 < 0, "FGm12"] <- 0

The boxplots show that all 4 treatments reduce the hirsutism level as months go by. Treatment 0 (only contraceptive) is the less effective one, while the other three are equally powerful on average, but with different distributions of levels.

We start the modelling of FGm12 by fitting a full linear model. After finding a good lm, we will build a semiparametric GAM based on it. Note that we cannot use variables FGm3 nor FGm6 as predictors.

Fitting a LM

gam1 <- gam(
  FGm12 ~ Treatment + FGm0 + SysPres + DiaPres
    + weight + height,
  data = hirs
)
summary(gam1)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## FGm12 ~ Treatment + FGm0 + SysPres + DiaPres + weight + height
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 18.85710   14.79649   1.274 0.206111    
## Treatment1  -4.32668    1.47552  -2.932 0.004360 ** 
## Treatment2  -4.29888    1.49026  -2.885 0.005005 ** 
## Treatment3  -3.87512    1.43820  -2.694 0.008551 ** 
## FGm0         0.59847    0.16798   3.563 0.000615 ***
## SysPres     -0.07357    0.05174  -1.422 0.158889    
## DiaPres      0.03294    0.07088   0.465 0.643340    
## weight       0.02578    0.04409   0.585 0.560279    
## height      -8.27818    9.05146  -0.915 0.363100    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## R-sq.(adj) =  0.169   Deviance explained = 24.3%
## GCV =  24.95  Scale est. = 22.482    n = 91

We shall remove non-significant variables.

gam4 <- gam(FGm12 ~ Treatment + FGm0,
  data = hirs
)
summary(gam4)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## FGm12 ~ Treatment + FGm0
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.6111     3.0551   0.200 0.841925    
## Treatment1   -4.5812     1.4391  -3.183 0.002027 ** 
## Treatment2   -4.4290     1.4430  -3.069 0.002870 ** 
## Treatment3   -3.5497     1.3820  -2.568 0.011942 *  
## FGm0          0.6217     0.1624   3.829 0.000244 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## R-sq.(adj) =  0.179   Deviance explained = 21.6%
## GCV =   23.5  Scale est. = 22.208    n = 91

The model is pretty similar in terms of R-sq.(adj) and Deviance explained. The effect of FGm0 might vary with the treatment, so let us add interactions.

gam5 <- gam(FGm12 ~ FGm0 * Treatment,
  data = hirs
)
summary(gam5)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## FGm12 ~ FGm0 * Treatment
## 
## Parametric coefficients:
##                   Estimate Std. Error t value Pr(>|t|)
## (Intercept)       5.479373   8.236653   0.665    0.508
## FGm0              0.347752   0.460024   0.756    0.452
## Treatment1       -8.335161   9.930825  -0.839    0.404
## Treatment2       -4.164049  10.917542  -0.381    0.704
## Treatment3      -14.046703   9.737356  -1.443    0.153
## FGm0:Treatment1   0.215853   0.540888   0.399    0.691
## FGm0:Treatment2   0.008244   0.588629   0.014    0.989
## FGm0:Treatment3   0.579027   0.536859   1.079    0.284
## 
## 
## R-sq.(adj) =  0.171   Deviance explained = 23.5%
## GCV = 24.596  Scale est. = 22.434    n = 91

R-sq.(adj) has actually decreased when adding the interaction between FGm0 and Treatment.

We will now compare gam4 and gam5 against all other models with anova. After finding the best model so far, we will smooth more predictive variables.

anova(gam4, gam5, gam1, test = "F")
## Analysis of Deviance Table
## 
## Model 1: FGm12 ~ Treatment + FGm0
## Model 2: FGm12 ~ FGm0 * Treatment
## Model 3: FGm12 ~ Treatment + FGm0 + SysPres + DiaPres + weight + height
##   Resid. Df Resid. Dev Df Deviance      F Pr(>F)
## 1        86     1909.9                          
## 2        83     1862.0  3   47.896 0.7101 0.5487
## 3        82     1843.5  1   18.485 0.8222 0.3672
anova(gam5, gam4, gam1, test = "F")
## Analysis of Deviance Table
## 
## Model 1: FGm12 ~ FGm0 * Treatment
## Model 2: FGm12 ~ Treatment + FGm0
## Model 3: FGm12 ~ Treatment + FGm0 + SysPres + DiaPres + weight + height
##   Resid. Df Resid. Dev Df Deviance      F Pr(>F)
## 1        83     1862.0                          
## 2        86     1909.9 -3  -47.896 0.7101 0.5487
## 3        82     1843.5  4   66.381 0.7381 0.5686

We cannot reject the hypothesis that adding interactions does not improve the model.

We decide that the best model is gam4 because it is simpler than gam5.

vis.gam(gam4, plot.type = "persp", theta = 30, phi = 30, type = "response")

vis.gam(gam4, plot.type = "contour", type = "response", main = "FGm12 predicted with gam4")

Let us analyze the residuals of gam4.

gam.check(gam4)

## 
## Method: GCV   Optimizer: magic
## Model required no smoothing parameter selectionModel rank =  5 / 5

Residuals seem to be homoscedastic and centered around zero but they seem to be more left tailed than a normal distribution. Moreover the plot of the fitted vs real values shows that the model is not very precise in fitting the data, this could be due to underfitting. Therefore, our linear model seems to be to simple and suffers in catching the structure of the data.

From here, we can add smooth terms to obtain a semiparametric GAM.

Fitting a semiparametric GAM

We can first try to smooth FGm0.

gam4.1 <- gam(FGm12 ~ Treatment + s(FGm0),
  data = hirs
)
summary(gam4.1)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## FGm12 ~ Treatment + s(FGm0)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  12.3682     0.9761  12.671  < 2e-16 ***
## Treatment1   -5.0772     1.3920  -3.648 0.000466 ***
## Treatment2   -4.5785     1.3902  -3.293 0.001467 ** 
## Treatment3   -3.5242     1.3417  -2.627 0.010305 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##          edf Ref.df     F p-value   
## s(FGm0) 5.74  6.869 4.012 0.00102 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.259   Deviance explained = 33.1%
## GCV = 22.447  Scale est. = 20.044    n = 91

R-sq.(adj) and Deviance explained increase significantly.

Now we will try to add interactions.

gam4.2 <- gam(FGm12 ~ s(FGm0, by = Treatment),
  data = hirs
)
summary(gam4.2)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## FGm12 ~ s(FGm0, by = Treatment)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    8.248      0.542   15.22   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                      edf Ref.df     F p-value   
## s(FGm0):Treatment0 5.529  6.263 2.517  0.0256 * 
## s(FGm0):Treatment1 1.849  2.329 2.579  0.0796 . 
## s(FGm0):Treatment2 1.000  1.000 0.991  0.3225   
## s(FGm0):Treatment3 3.957  4.867 4.157  0.0026 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.288   Deviance explained = 38.5%
## GCV =  22.58  Scale est. = 19.271    n = 91

R-sq.(adj) and explained Deviance improve slightly even if terms become less significant. Let us compare both models with anova.

anova(gam4, gam4.1, gam4.2, test = "F")
## Analysis of Deviance Table
## 
## Model 1: FGm12 ~ Treatment + FGm0
## Model 2: FGm12 ~ Treatment + s(FGm0)
## Model 3: FGm12 ~ s(FGm0, by = Treatment)
##   Resid. Df Resid. Dev     Df Deviance      F  Pr(>F)  
## 1    86.000     1909.9                                 
## 2    80.131     1628.8 5.8685   281.13 2.4858 0.03104 *
## 3    75.540     1496.7 4.5911   132.09 1.4929 0.20649  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(gam4.1, gam4.2, test = "F")
## Analysis of Deviance Table
## 
## Model 1: FGm12 ~ Treatment + s(FGm0)
## Model 2: FGm12 ~ s(FGm0, by = Treatment)
##   Resid. Df Resid. Dev     Df Deviance      F Pr(>F)
## 1    80.131     1628.8                              
## 2    75.540     1496.7 4.5911   132.09 1.4929 0.2065

We cannot reject the hypothesis that adding interactions does not improve the model. However, we will keep them in the following models because intuitively they should be useful. Later, with more terms, we will check again if interactions should be added.

Now we will introduce previously removed variables one by one.

gam4.4 <- gam(FGm12 ~ Treatment + s(FGm0) + s(SysPres),
  data = hirs
)
summary(gam4.4)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## FGm12 ~ Treatment + s(FGm0) + s(SysPres)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  12.2573     0.9757  12.562  < 2e-16 ***
## Treatment1   -4.9105     1.3993  -3.509 0.000743 ***
## Treatment2   -4.3015     1.4037  -3.064 0.002977 ** 
## Treatment3   -3.5110     1.3316  -2.637 0.010067 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##              edf Ref.df     F p-value   
## s(FGm0)    5.811  6.936 3.891 0.00121 **
## s(SysPres) 1.711  2.164 1.097 0.37270   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.272   Deviance explained = 35.7%
## GCV = 22.542  Scale est. = 19.688    n = 91

s(SysPres) is not significant.

gam4.5 <- gam(FGm12 ~ Treatment + s(FGm0) + s(DiaPres),
  data = hirs
)
summary(gam4.5)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## FGm12 ~ Treatment + s(FGm0) + s(DiaPres)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  12.3478     0.9831  12.560  < 2e-16 ***
## Treatment1   -5.0322     1.4090  -3.571 0.000609 ***
## Treatment2   -4.5344     1.4112  -3.213 0.001903 ** 
## Treatment3   -3.5283     1.3422  -2.629 0.010299 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##              edf Ref.df     F p-value   
## s(FGm0)    6.153  7.279 3.804 0.00105 **
## s(DiaPres) 2.115  2.661 0.990 0.52962   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.275   Deviance explained = 36.6%
## GCV = 22.676  Scale est. = 19.619    n = 91

s(DiaPres) is not significant.

gam4.6 <- gam(FGm12 ~ Treatment + s(FGm0) + s(weight),
  data = hirs
)
summary(gam4.6)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## FGm12 ~ Treatment + s(FGm0) + s(weight)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  12.2937     0.9897  12.422  < 2e-16 ***
## Treatment1   -4.6889     1.4163  -3.311  0.00142 ** 
## Treatment2   -4.5484     1.4201  -3.203  0.00198 ** 
## Treatment3   -3.6212     1.3583  -2.666  0.00934 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##             edf Ref.df     F p-value   
## s(FGm0)   5.211  6.279 3.929 0.00143 **
## s(weight) 4.389  5.395 0.835 0.56494   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.268   Deviance explained = 37.1%
## GCV = 23.273  Scale est. = 19.795    n = 91

s(weight) is not significant.

gam4.7 <- gam(FGm12 ~ Treatment + s(FGm0) + s(height),
  data = hirs
)
summary(gam4.7)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## FGm12 ~ Treatment + s(FGm0) + s(height)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  12.4056     0.9762  12.708  < 2e-16 ***
## Treatment1   -5.0209     1.3932  -3.604 0.000543 ***
## Treatment2   -4.5568     1.3905  -3.277 0.001552 ** 
## Treatment3   -3.7291     1.3646  -2.733 0.007727 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##             edf Ref.df     F  p-value    
## s(FGm0)   5.930  7.061 4.046 0.000787 ***
## s(height) 1.001  1.002 0.853 0.358823    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.261   Deviance explained = 34.2%
## GCV = 22.726  Scale est. = 19.996    n = 91

s(height) is not significant.

We will now try tensor splines one by one.

gam4.8 <- gam(FGm12 ~ Treatment + s(FGm0) + te(SysPres, DiaPres),
  data = hirs
)
summary(gam4.8)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## FGm12 ~ Treatment + s(FGm0) + te(SysPres, DiaPres)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  12.3083     0.9651  12.754  < 2e-16 ***
## Treatment1   -5.0462     1.3884  -3.635 0.000502 ***
## Treatment2   -4.2829     1.3914  -3.078 0.002889 ** 
## Treatment3   -3.5935     1.3168  -2.729 0.007876 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                       edf Ref.df     F p-value   
## s(FGm0)             6.346  7.461 3.814 0.00136 **
## te(SysPres,DiaPres) 3.996  4.635 1.480 0.19701   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.304   Deviance explained = 40.7%
## GCV = 22.366  Scale est. = 18.841    n = 91

Still, not significant. We will try with te(weight, height).

gam4.9 <- gam(FGm12 ~ Treatment + s(FGm0) + te(weight, height),
  data = hirs
)
summary(gam4.9)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## FGm12 ~ Treatment + s(FGm0) + te(weight, height)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  12.6476     0.9869  12.816  < 2e-16 ***
## Treatment1   -5.2275     1.3936  -3.751 0.000338 ***
## Treatment2   -4.8505     1.4085  -3.444 0.000929 ***
## Treatment3   -4.1696     1.3981  -2.982 0.003825 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                     edf Ref.df     F  p-value    
## s(FGm0)           6.238  7.339 4.283 0.000453 ***
## te(weight,height) 3.362  3.659 0.914 0.460328    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.275   Deviance explained = 37.7%
## GCV =  23.05  Scale est. = 19.605    n = 91

Not significant.

Now we will add interactions, one by one.

gam4.4.int <- gam(FGm12 ~ Treatment + s(FGm0) + s(SysPres, by = Treatment),
  data = hirs
)
summary(gam4.4.int)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## FGm12 ~ Treatment + s(FGm0) + s(SysPres, by = Treatment)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   12.113      1.026  11.801  < 2e-16 ***
## Treatment1    -4.795      1.451  -3.304  0.00146 ** 
## Treatment2    -4.337      1.449  -2.993  0.00372 ** 
## Treatment3    -3.402      1.399  -2.432  0.01739 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                         edf Ref.df     F p-value   
## s(FGm0)               5.911  7.037 3.550 0.00222 **
## s(SysPres):Treatment0 1.492  1.832 0.860 0.33552   
## s(SysPres):Treatment1 1.000  1.000 0.058 0.81104   
## s(SysPres):Treatment2 1.000  1.001 0.012 0.91518   
## s(SysPres):Treatment3 1.558  1.934 0.640 0.46465   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.262   Deviance explained = 37.6%
## GCV = 23.894  Scale est. = 19.966    n = 91

s(SysPres, by=Treatment) is not significant.

gam4.5.int <- gam(FGm12 ~ Treatment + s(FGm0) + s(DiaPres, by = Treatment),
  data = hirs
)
summary(gam4.5.int)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## FGm12 ~ Treatment + s(FGm0) + s(DiaPres, by = Treatment)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   12.147      1.042  11.660  < 2e-16 ***
## Treatment1    -4.849      1.459  -3.323  0.00138 ** 
## Treatment2    -4.384      1.454  -3.015  0.00350 ** 
## Treatment3    -3.384      1.391  -2.433  0.01734 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                         edf Ref.df     F p-value   
## s(FGm0)               6.051  7.180 3.048 0.00633 **
## s(DiaPres):Treatment0 1.000  1.000 0.146 0.70304   
## s(DiaPres):Treatment1 1.000  1.000 0.001 0.97320   
## s(DiaPres):Treatment2 1.000  1.000 0.001 0.97783   
## s(DiaPres):Treatment3 2.429  3.012 1.479 0.22289   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.275   Deviance explained = 39.1%
## GCV = 23.641  Scale est. = 19.619    n = 91

s(DiaPres, by=Treatment) is not significant.

gam4.6.int <- gam(FGm12 ~ Treatment + s(FGm0) + s(weight, by = Treatment),
  data = hirs
)
summary(gam4.6.int)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## FGm12 ~ Treatment + s(FGm0) + s(weight, by = Treatment)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   12.111      1.046  11.573  < 2e-16 ***
## Treatment1    -4.927      1.504  -3.277  0.00159 ** 
## Treatment2    -4.378      1.469  -2.980  0.00388 ** 
## Treatment3    -3.322      1.394  -2.384  0.01964 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                        edf Ref.df     F p-value   
## s(FGm0)              5.337  6.426 3.996 0.00151 **
## s(weight):Treatment0 1.899  2.348 0.405 0.60837   
## s(weight):Treatment1 1.431  1.736 0.397 0.53697   
## s(weight):Treatment2 1.000  1.000 0.158 0.69169   
## s(weight):Treatment3 1.503  1.846 0.375 0.64459   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.258   Deviance explained = 37.5%
## GCV = 24.074  Scale est. = 20.061    n = 91

s(weight, by=Treatment) is not significant.

gam4.7.int <- gam(FGm12 ~ Treatment + s(FGm0) + s(height, by = Treatment),
  data = hirs
)
summary(gam4.7.int)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## FGm12 ~ Treatment + s(FGm0) + s(height, by = Treatment)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   12.738      1.068  11.928  < 2e-16 ***
## Treatment1    -5.461      1.436  -3.803 0.000299 ***
## Treatment2    -4.782      1.396  -3.426 0.001024 ** 
## Treatment3    -4.244      1.422  -2.983 0.003907 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                        edf Ref.df     F p-value   
## s(FGm0)              6.071  7.108 3.666 0.00171 **
## s(height):Treatment0 3.766  4.486 2.114 0.08190 . 
## s(height):Treatment1 2.038  2.532 1.816 0.13289   
## s(height):Treatment2 2.323  2.836 1.756 0.22275   
## s(height):Treatment3 1.824  2.267 1.492 0.29097   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.398   Deviance explained = 52.5%
## GCV = 20.884  Scale est. = 16.289    n = 91

s(height, by=Treatment) is not significant.

We will now try tensor splines one by one.

gam4.8.int <- gam(FGm12 ~ Treatment + s(FGm0) + te(SysPres, DiaPres, by = Treatment),
  data = hirs
)
summary(gam4.8.int)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## FGm12 ~ Treatment + s(FGm0) + te(SysPres, DiaPres, by = Treatment)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   12.940      1.160  11.154  < 2e-16 ***
## Treatment1    -5.723      1.545  -3.704 0.000431 ***
## Treatment2    -5.939      1.582  -3.755 0.000365 ***
## Treatment3    -4.218      1.470  -2.870 0.005484 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                                  edf Ref.df     F p-value  
## s(FGm0)                        6.296  7.401 2.808  0.0101 *
## te(SysPres,DiaPres):Treatment0 3.000  3.000 1.643  0.1876  
## te(SysPres,DiaPres):Treatment1 3.792  4.334 0.417  0.7863  
## te(SysPres,DiaPres):Treatment2 3.875  4.395 0.690  0.6552  
## te(SysPres,DiaPres):Treatment3 3.000  3.000 3.282  0.0261 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.332   Deviance explained = 50.2%
## GCV = 24.545  Scale est. = 18.082    n = 91

Still, not significant. We will try with te(weight, height).

gam4.9.int <- gam(FGm12 ~ Treatment + s(FGm0) + te(weight, height, by = Treatment),
  data = hirs
)
summary(gam4.9.int)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## FGm12 ~ Treatment + s(FGm0) + te(weight, height, by = Treatment)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   12.920      1.460   8.848 7.44e-13 ***
## Treatment1    -4.852      1.762  -2.754  0.00758 ** 
## Treatment2    -5.136      1.727  -2.973  0.00410 ** 
## Treatment3    -4.262      1.717  -2.483  0.01555 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                                edf Ref.df      F  p-value    
## s(FGm0)                      1.000  1.000 13.202 0.000543 ***
## te(weight,height):Treatment0 6.212  6.972  1.777 0.105300    
## te(weight,height):Treatment1 3.000  3.000  2.451 0.070988 .  
## te(weight,height):Treatment2 3.746  4.271  0.637 0.641862    
## te(weight,height):Treatment3 6.351  7.128  2.508 0.023349 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.378   Deviance explained = 53.9%
## GCV = 22.967  Scale est. = 16.832    n = 91

Overall, it looks like the best model is gam4.2. Let us compare it will all other models with anova.

anova(gam4.2, gam4.4, gam4.5, gam4.6, gam4.7, gam4.8, gam4.9,
  gam4.4.int, gam4.5.int, gam4.6.int, gam4.7.int, gam4.8.int, gam4.9.int,
  test = "F"
)
## Analysis of Deviance Table
## 
## Model  1: FGm12 ~ s(FGm0, by = Treatment)
## Model  2: FGm12 ~ Treatment + s(FGm0) + s(SysPres)
## Model  3: FGm12 ~ Treatment + s(FGm0) + s(DiaPres)
## Model  4: FGm12 ~ Treatment + s(FGm0) + s(weight)
## Model  5: FGm12 ~ Treatment + s(FGm0) + s(height)
## Model  6: FGm12 ~ Treatment + s(FGm0) + te(SysPres, DiaPres)
## Model  7: FGm12 ~ Treatment + s(FGm0) + te(weight, height)
## Model  8: FGm12 ~ Treatment + s(FGm0) + s(SysPres, by = Treatment)
## Model  9: FGm12 ~ Treatment + s(FGm0) + s(DiaPres, by = Treatment)
## Model 10: FGm12 ~ Treatment + s(FGm0) + s(weight, by = Treatment)
## Model 11: FGm12 ~ Treatment + s(FGm0) + s(height, by = Treatment)
## Model 12: FGm12 ~ Treatment + s(FGm0) + te(SysPres, DiaPres, by = Treatment)
## Model 13: FGm12 ~ Treatment + s(FGm0) + te(weight, height, by = Treatment)
##    Resid. Df Resid. Dev       Df Deviance       F   Pr(>F)   
## 1     75.540     1496.7                                      
## 2     77.900     1564.8 -2.35996   -68.05  1.7133 0.183041   
## 3     77.060     1544.6  0.84066    20.15  1.4238 0.230910   
## 4     75.326     1532.1  1.73348    12.50  0.4286 0.624649   
## 5     78.937     1601.1 -3.61126   -68.98  1.1348 0.346110   
## 6     74.905     1444.3  4.03264   156.79  2.3100 0.066696 . 
## 7     76.002     1517.5 -1.09720   -73.16  3.9616 0.047165 * 
## 8     74.196     1518.2  1.80618    -0.72                    
## 9     73.808     1481.6  0.38761    36.56  5.6042 0.046457 * 
## 10    73.644     1521.2  0.16425   -39.61                    
## 11    67.770     1156.2  5.87409   365.07  3.6924 0.003438 **
## 12    64.870     1212.2  2.89951   -56.03                    
## 13    64.628     1122.5  0.24192    89.65 22.0174 0.003567 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We remove the worst models.

anova(gam4.1, gam4.8, gam4.9,
  gam4.5.int, gam4.7.int, gam4.9.int,
  test = "F"
)
## Analysis of Deviance Table
## 
## Model 1: FGm12 ~ Treatment + s(FGm0)
## Model 2: FGm12 ~ Treatment + s(FGm0) + te(SysPres, DiaPres)
## Model 3: FGm12 ~ Treatment + s(FGm0) + te(weight, height)
## Model 4: FGm12 ~ Treatment + s(FGm0) + s(DiaPres, by = Treatment)
## Model 5: FGm12 ~ Treatment + s(FGm0) + s(height, by = Treatment)
## Model 6: FGm12 ~ Treatment + s(FGm0) + te(weight, height, by = Treatment)
##   Resid. Df Resid. Dev      Df Deviance      F  Pr(>F)   
## 1    80.131     1628.8                                   
## 2    74.905     1444.3  5.2267   184.50 2.0972 0.07419 . 
## 3    76.002     1517.5 -1.0972   -73.16 3.9616 0.04716 * 
## 4    73.808     1481.6  2.1938    35.84 0.9707 0.39093   
## 5    67.770     1156.2  6.0383   325.47 3.2023 0.00804 **
## 6    64.628     1122.5  3.1414    33.63 0.6360 0.60161   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We do the same again.

anova(gam4.1, gam4.8, gam4.9,
  gam4.7.int,
  test = "F"
)
## Analysis of Deviance Table
## 
## Model 1: FGm12 ~ Treatment + s(FGm0)
## Model 2: FGm12 ~ Treatment + s(FGm0) + te(SysPres, DiaPres)
## Model 3: FGm12 ~ Treatment + s(FGm0) + te(weight, height)
## Model 4: FGm12 ~ Treatment + s(FGm0) + s(height, by = Treatment)
##   Resid. Df Resid. Dev      Df Deviance      F  Pr(>F)  
## 1    80.131     1628.8                                  
## 2    74.905     1444.3  5.2267   184.50 2.1671 0.06521 .
## 3    76.002     1517.5 -1.0972   -73.16 4.0936 0.04342 *
## 4    67.770     1156.2  8.2321   361.31 2.6945 0.01164 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

It seems that adding s(height, by = Treatment) is useful. We can also try to add te(SysPres, DiaPres, by = Treatment).

gam4.10 <- gam(
  FGm12 ~ Treatment + s(FGm0) + s(height, by = Treatment)
    + te(SysPres, DiaPres, by = Treatment),
  data = hirs
)
summary(gam4.10)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## FGm12 ~ Treatment + s(FGm0) + s(height, by = Treatment) + te(SysPres, 
##     DiaPres, by = Treatment)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   -5.065     11.382  -0.445    0.658
## Treatment1    12.352     11.419   1.082    0.285
## Treatment2    12.163     11.407   1.066    0.291
## Treatment3    13.484     11.403   1.182    0.243
## 
## Approximate significance of smooth terms:
##                                  edf Ref.df     F p-value   
## s(FGm0)                        5.957  6.967 3.794 0.00217 **
## s(height):Treatment0           5.401  5.917 1.317 0.26609   
## s(height):Treatment1           2.351  2.834 2.381 0.05916 . 
## s(height):Treatment2           1.960  2.354 1.843 0.21313   
## s(height):Treatment3           2.408  2.982 1.796 0.13219   
## te(SysPres,DiaPres):Treatment0 7.263  7.601 1.270 0.32081   
## te(SysPres,DiaPres):Treatment1 4.312  5.000 0.622 0.68455   
## te(SysPres,DiaPres):Treatment2 3.904  4.380 1.120 0.40402   
## te(SysPres,DiaPres):Treatment3 3.000  3.000 4.306 0.00888 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.541   Deviance explained = 74.3%
## GCV =   22.4  Scale est. = 12.417    n = 91

R-sq.(adj) and Deviance explained have increased a lot. Let us check the residuals of gam4.10.

gam.check(gam4.10)

## 
## Method: GCV   Optimizer: magic
## Smoothing parameter selection converged after 27 iterations.
## The RMS GCV score gradient at convergence was 4.840213e-07 .
## The Hessian was positive definite.
## Model rank =  145 / 145 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##                                   k'   edf k-index p-value
## s(FGm0)                         9.00  5.96    1.16    0.94
## s(height):Treatment0            9.00  5.40    0.94    0.24
## s(height):Treatment1            9.00  2.35    0.94    0.28
## s(height):Treatment2            9.00  1.96    0.94    0.27
## s(height):Treatment3            9.00  2.41    0.94    0.18
## te(SysPres,DiaPres):Treatment0 24.00  7.26    1.12    0.89
## te(SysPres,DiaPres):Treatment1 24.00  4.31    1.04    0.61
## te(SysPres,DiaPres):Treatment2 24.00  3.90    1.15    0.91
## te(SysPres,DiaPres):Treatment3 24.00  3.00    1.12    0.78

The tensor spline has 25 knots, which are way more than the edf of its variables. Hence, we reduce them to 4^2 = 16.

gam4.11 <- gam(
  FGm12 ~ Treatment + s(FGm0) + s(height, by = Treatment)
    + te(SysPres, DiaPres, by = Treatment, k = 4),
  data = hirs
)
summary(gam4.11)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## FGm12 ~ Treatment + s(FGm0) + s(height, by = Treatment) + te(SysPres, 
##     DiaPres, by = Treatment, k = 4)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   12.858      1.245  10.331 9.57e-15 ***
## Treatment1    -5.564      1.580  -3.521 0.000847 ***
## Treatment2    -5.492      1.615  -3.401 0.001225 ** 
## Treatment3    -4.703      1.562  -3.011 0.003858 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                                  edf Ref.df     F p-value   
## s(FGm0)                        5.979  6.990 3.061 0.00787 **
## s(height):Treatment0           3.604  4.319 1.448 0.19943   
## s(height):Treatment1           1.929  2.357 1.637 0.16382   
## s(height):Treatment2           1.902  2.296 1.604 0.25767   
## s(height):Treatment3           1.610  1.989 1.200 0.28269   
## te(SysPres,DiaPres):Treatment0 3.040  3.077 0.517 0.69287   
## te(SysPres,DiaPres):Treatment1 3.680  4.138 0.429 0.79633   
## te(SysPres,DiaPres):Treatment2 3.751  4.156 0.795 0.51798   
## te(SysPres,DiaPres):Treatment3 3.720  4.201 2.313 0.05767 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.442   Deviance explained = 64.1%
## GCV = 23.791  Scale est. = 15.107    n = 91
gam.check(gam4.11)

## 
## Method: GCV   Optimizer: magic
## Smoothing parameter selection converged after 27 iterations.
## The RMS GCV score gradient at convergence was 3.757111e-07 .
## The Hessian was positive definite.
## Model rank =  109 / 109 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##                                   k'   edf k-index p-value
## s(FGm0)                         9.00  5.98    1.10    0.82
## s(height):Treatment0            9.00  3.60    0.90    0.19
## s(height):Treatment1            9.00  1.93    0.90    0.14
## s(height):Treatment2            9.00  1.90    0.90    0.16
## s(height):Treatment3            9.00  1.61    0.90    0.18
## te(SysPres,DiaPres):Treatment0 15.00  3.04    1.20    0.95
## te(SysPres,DiaPres):Treatment1 15.00  3.68    1.19    0.97
## te(SysPres,DiaPres):Treatment2 15.00  3.75    1.19    0.96
## te(SysPres,DiaPres):Treatment3 15.00  3.72    1.20    0.96
anova(gam4.10, gam4.11, test = "F")
## Analysis of Deviance Table
## 
## Model 1: FGm12 ~ Treatment + s(FGm0) + s(height, by = Treatment) + te(SysPres, 
##     DiaPres, by = Treatment)
## Model 2: FGm12 ~ Treatment + s(FGm0) + s(height, by = Treatment) + te(SysPres, 
##     DiaPres, by = Treatment, k = 4)
##   Resid. Df Resid. Dev      Df Deviance      F  Pr(>F)  
## 1    45.965     626.38                                  
## 2    53.478     872.98 -7.5131   -246.6 2.6434 0.01978 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Even though R-sq.(adj) and explained Deviance have decreased a little, we can reject that decreasing the number of knots of te(weight, height, by=Treatment) does not improve the model.

Maybe we need more interactions. We will start with s(FGm0).

gam4.12 <- gam(
  FGm12 ~ Treatment + s(FGm0, by = Treatment) + s(height, by = Treatment)
    + te(SysPres, DiaPres, by = Treatment, k = 4),
  data = hirs
)
summary(gam4.12)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## FGm12 ~ Treatment + s(FGm0, by = Treatment) + s(height, by = Treatment) + 
##     te(SysPres, DiaPres, by = Treatment, k = 4)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   12.230      1.207  10.132 5.34e-14 ***
## Treatment1    -5.260      1.525  -3.448  0.00111 ** 
## Treatment2    -5.400      1.598  -3.378  0.00137 ** 
## Treatment3    -3.972      1.592  -2.494  0.01578 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                                  edf Ref.df     F p-value   
## s(FGm0):Treatment0             1.000  1.000 0.002 0.96680   
## s(FGm0):Treatment1             4.647  5.531 2.475 0.03546 * 
## s(FGm0):Treatment2             1.000  1.000 0.655 0.42186   
## s(FGm0):Treatment3             4.053  4.892 4.800 0.00121 **
## s(height):Treatment0           3.786  4.519 1.132 0.27648   
## s(height):Treatment1           1.378  1.627 0.590 0.39446   
## s(height):Treatment2           1.463  1.772 0.474 0.65956   
## s(height):Treatment3           1.000  1.000 3.404 0.07061 . 
## te(SysPres,DiaPres):Treatment0 3.000  3.000 0.610 0.61149   
## te(SysPres,DiaPres):Treatment1 3.595  3.995 0.596 0.67454   
## te(SysPres,DiaPres):Treatment2 4.509  5.048 1.301 0.27995   
## te(SysPres,DiaPres):Treatment3 4.597  5.238 3.700 0.00749 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.496   Deviance explained = 70.4%
## GCV = 23.406  Scale est. = 13.625    n = 91

R-sq.(adj) and explained Deviance have increased again. weight is the only baseline predictor we are not using; let us add it again, but now interacting with Treatment.

gam4.13 <- gam(
  FGm12 ~ s(FGm0, by = Treatment) + s(height, by = Treatment)
    + s(weight, by = Treatment)
    + te(SysPres, DiaPres, by = Treatment, k = 4),
  data = hirs
)
summary(gam4.13)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## FGm12 ~ s(FGm0, by = Treatment) + s(height, by = Treatment) + 
##     s(weight, by = Treatment) + te(SysPres, DiaPres, by = Treatment, 
##     k = 4)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   7.8015     0.3022   25.82 6.63e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                                  edf Ref.df      F  p-value    
## s(FGm0):Treatment0             6.398  6.516 19.844 8.05e-06 ***
## s(FGm0):Treatment1             8.743  8.912 22.964 1.54e-06 ***
## s(FGm0):Treatment2             1.000  1.000  1.634 0.221950    
## s(FGm0):Treatment3             6.457  7.131 44.199  < 2e-16 ***
## s(height):Treatment0           7.065  7.206 25.957 1.41e-06 ***
## s(height):Treatment1           2.322  2.750  4.637 0.021204 *  
## s(height):Treatment2           3.364  3.850  2.807 0.067149 .  
## s(height):Treatment3           1.000  1.000 28.625 0.000105 ***
## s(weight):Treatment0           1.000  1.000 14.590 0.001880 ** 
## s(weight):Treatment1           1.859  2.199  6.582 0.008554 ** 
## s(weight):Treatment2           7.926  8.409 21.405 7.64e-06 ***
## s(weight):Treatment3           4.501  5.192  3.004 0.038868 *  
## te(SysPres,DiaPres):Treatment0 6.177  6.279 15.830 1.77e-05 ***
## te(SysPres,DiaPres):Treatment1 4.438  4.994  5.180 0.006575 ** 
## te(SysPres,DiaPres):Treatment2 6.411  6.703 27.106 8.59e-06 ***
## te(SysPres,DiaPres):Treatment3 7.806  8.638 24.197 4.62e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.965   Deviance explained = 99.5%
## GCV = 6.2762  Scale est. = 0.9334    n = 91

This seems to be a good model. Let us check that the number of knots is a proper one.

gam.check(gam4.13)

## 
## Method: GCV   Optimizer: magic
## Smoothing parameter selection converged after 258 iterations.
## The RMS GCV score gradient at convergence was 0.0008328835 .
## The Hessian was not positive definite.
## Model rank =  169 / 169 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##                                   k'   edf k-index p-value
## s(FGm0):Treatment0              9.00  6.40    1.26    0.99
## s(FGm0):Treatment1              9.00  8.74    1.26    0.99
## s(FGm0):Treatment2              9.00  1.00    1.26    0.98
## s(FGm0):Treatment3              9.00  6.46    1.26    0.99
## s(height):Treatment0            9.00  7.06    0.98    0.41
## s(height):Treatment1            9.00  2.32    0.98    0.38
## s(height):Treatment2            9.00  3.36    0.98    0.38
## s(height):Treatment3            9.00  1.00    0.98    0.39
## s(weight):Treatment0            9.00  1.00    1.12    0.80
## s(weight):Treatment1            9.00  1.86    1.12    0.86
## s(weight):Treatment2            9.00  7.93    1.12    0.80
## s(weight):Treatment3            9.00  4.50    1.12    0.86
## te(SysPres,DiaPres):Treatment0 15.00  6.18    1.09    0.78
## te(SysPres,DiaPres):Treatment1 15.00  4.44    1.16    0.91
## te(SysPres,DiaPres):Treatment2 15.00  6.41    1.17    0.93
## te(SysPres,DiaPres):Treatment3 15.00  7.81    1.18    0.95

Gam.13 shows nice fitted values compared to real ones, however residuals are not normally distributed and seem to have heavier tails, highlighting that in some cases the model deviates significantly from the real values.

The number of knots of s(FGm0, by=Treatment) should be higher, but fitting gam4.13 is so slow that adding more knots causes the algorithm to crash.

For some treatments, FGm0, height and weight seem to be linear. Let us make them linear terms one by one.

gam4.13.1 <- gam(
  FGm12 ~ FGm0 * Treatment + s(height, by = Treatment)
    + s(weight, by = Treatment)
    + te(SysPres, DiaPres, by = Treatment, k = 4),
  data = hirs
)
summary(gam4.13.1)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## FGm12 ~ FGm0 * Treatment + s(height, by = Treatment) + s(weight, 
##     by = Treatment) + te(SysPres, DiaPres, by = Treatment, k = 4)
## 
## Parametric coefficients:
##                 Estimate Std. Error t value Pr(>|t|)  
## (Intercept)       8.3103     9.2059   0.903   0.3709  
## FGm0              0.2653     0.4979   0.533   0.5965  
## Treatment1      -11.2548    10.8179  -1.040   0.3031  
## Treatment2      -10.3307    12.0184  -0.860   0.3940  
## Treatment3      -28.3519    11.8319  -2.396   0.0203 *
## FGm0:Treatment1   0.3148     0.5686   0.554   0.5822  
## FGm0:Treatment2   0.2195     0.6516   0.337   0.7376  
## FGm0:Treatment3   1.1934     0.6318   1.889   0.0646 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                                  edf Ref.df      F p-value   
## s(height):Treatment0           5.293  5.925  1.921 0.09778 . 
## s(height):Treatment1           2.141  2.628  2.033 0.12375   
## s(height):Treatment2           2.094  2.584  3.192 0.04227 * 
## s(height):Treatment3           1.555  1.904  1.654 0.14461   
## s(weight):Treatment0           1.000  1.000  3.340 0.07348 . 
## s(weight):Treatment1           1.000  1.000  0.066 0.79837   
## s(weight):Treatment2           1.000  1.000 10.915 0.00175 **
## s(weight):Treatment3           1.769  2.168  0.787 0.41892   
## te(SysPres,DiaPres):Treatment0 3.000  3.000  0.520 0.67053   
## te(SysPres,DiaPres):Treatment1 3.788  4.298  0.679 0.68169   
## te(SysPres,DiaPres):Treatment2 4.125  4.719  3.125 0.01532 * 
## te(SysPres,DiaPres):Treatment3 5.199  5.888  3.937 0.00239 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.553   Deviance explained = 74.6%
## GCV = 21.572  Scale est. = 12.098    n = 91
anova(gam4.13, gam4.13.1, test = "F")
## Analysis of Deviance Table
## 
## Model 1: FGm12 ~ s(FGm0, by = Treatment) + s(height, by = Treatment) + 
##     s(weight, by = Treatment) + te(SysPres, DiaPres, by = Treatment, 
##     k = 4)
## Model 2: FGm12 ~ FGm0 * Treatment + s(height, by = Treatment) + s(weight, 
##     by = Treatment) + te(SysPres, DiaPres, by = Treatment, k = 4)
##   Resid. Df Resid. Dev      Df Deviance      F    Pr(>F)    
## 1     8.221      12.63                                      
## 2    46.887     617.44 -38.666   -604.8 16.758 0.0001177 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

According to anova, it is better to treat FGm0 linearly, but if we do so, then predictors are less significant and R-sq.(adj) and explained Deviance decrease a lot.

Let us try with height.

gam4.13.2 <- gam(
  FGm12 ~ s(FGm0, by = Treatment) + height * Treatment
    + s(weight, by = Treatment)
    + te(SysPres, DiaPres, by = Treatment, k = 4),
  data = hirs
)
summary(gam4.13.2)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## FGm12 ~ s(FGm0, by = Treatment) + height * Treatment + s(weight, 
##     by = Treatment) + te(SysPres, DiaPres, by = Treatment, k = 4)
## 
## Parametric coefficients:
##                   Estimate Std. Error t value Pr(>|t|)
## (Intercept)         32.716     37.281   0.878    0.384
## height             -12.821     22.992  -0.558    0.579
## Treatment1         -33.897     48.194  -0.703    0.485
## Treatment2          17.858     53.502   0.334    0.740
## Treatment3           9.314     42.763   0.218    0.828
## height:Treatment1   18.413     29.630   0.621    0.537
## height:Treatment2  -14.813     32.925  -0.450    0.655
## height:Treatment3   -7.718     26.490  -0.291    0.772
## 
## Approximate significance of smooth terms:
##                                  edf Ref.df     F  p-value    
## s(FGm0):Treatment0             1.000  1.000 0.319 0.574477    
## s(FGm0):Treatment1             4.760  5.682 2.715 0.024192 *  
## s(FGm0):Treatment2             1.818  2.284 0.543 0.578144    
## s(FGm0):Treatment3             4.206  5.058 5.375 0.000461 ***
## s(weight):Treatment0           1.570  1.915 0.932 0.467985    
## s(weight):Treatment1           1.000  1.000 3.271 0.076304 .  
## s(weight):Treatment2           1.000  1.000 8.059 0.006445 ** 
## s(weight):Treatment3           1.394  1.662 1.300 0.183937    
## te(SysPres,DiaPres):Treatment0 3.368  3.689 1.843 0.177343    
## te(SysPres,DiaPres):Treatment1 3.000  3.000 1.265 0.296186    
## te(SysPres,DiaPres):Treatment2 4.706  5.245 2.919 0.020408 *  
## te(SysPres,DiaPres):Treatment3 3.000  3.000 5.976 0.001404 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.517   Deviance explained =   72%
## GCV =   22.8  Scale est. = 13.073    n = 91
anova(gam4.13, gam4.13.2, test = "F")
## Analysis of Deviance Table
## 
## Model 1: FGm12 ~ s(FGm0, by = Treatment) + s(height, by = Treatment) + 
##     s(weight, by = Treatment) + te(SysPres, DiaPres, by = Treatment, 
##     k = 4)
## Model 2: FGm12 ~ s(FGm0, by = Treatment) + height * Treatment + s(weight, 
##     by = Treatment) + te(SysPres, DiaPres, by = Treatment, k = 4)
##   Resid. Df Resid. Dev      Df Deviance      F   Pr(>F)    
## 1     8.221      12.63                                     
## 2    48.464     682.15 -40.243  -669.52 17.824 9.17e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We obtain analogous results. Let us finally try with weight.

gam4.13.3 <- gam(
  FGm12 ~ s(FGm0, by = Treatment) + s(height, by = Treatment)
    + weight * Treatment
    + te(SysPres, DiaPres, by = Treatment, k = 4),
  data = hirs
)
summary(gam4.13.3)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## FGm12 ~ s(FGm0, by = Treatment) + s(height, by = Treatment) + 
##     weight * Treatment + te(SysPres, DiaPres, by = Treatment, 
##     k = 4)
## 
## Parametric coefficients:
##                    Estimate Std. Error t value Pr(>|t|)  
## (Intercept)        25.52851   33.81411   0.755   0.4557  
## weight              0.17292    0.09130   1.894   0.0671 .
## Treatment1        -34.96403   34.31610  -1.019   0.3157  
## Treatment2        -37.41467   33.98955  -1.101   0.2790  
## Treatment3        -13.58996   34.02578  -0.399   0.6922  
## weight:Treatment1   0.07347    0.12873   0.571   0.5720  
## weight:Treatment2   0.10460    0.10378   1.008   0.3209  
## weight:Treatment3  -0.22138    0.10450  -2.118   0.0418 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                                  edf Ref.df      F  p-value    
## s(FGm0):Treatment0             5.223  5.412  5.468 0.000824 ***
## s(FGm0):Treatment1             7.384  8.078  6.330 5.38e-05 ***
## s(FGm0):Treatment2             1.018  1.031  4.188 0.045268 *  
## s(FGm0):Treatment3             5.156  6.094 10.851 1.43e-06 ***
## s(height):Treatment0           6.904  7.139  5.438 0.000196 ***
## s(height):Treatment1           1.000  1.000  0.827 0.369817    
## s(height):Treatment2           2.955  3.570  6.292 0.001029 ** 
## s(height):Treatment3           1.152  1.254  4.982 0.016408 *  
## te(SysPres,DiaPres):Treatment0 6.063  6.216  3.759 0.005170 ** 
## te(SysPres,DiaPres):Treatment1 3.000  3.000  3.752 0.020101 *  
## te(SysPres,DiaPres):Treatment2 5.274  6.248  6.921 6.54e-05 ***
## te(SysPres,DiaPres):Treatment3 5.092  5.721 10.266 7.84e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.832   Deviance explained = 93.9%
## GCV = 12.632  Scale est. = 4.5502    n = 91
anova(gam4.13, gam4.13.3, test = "F")
## Analysis of Deviance Table
## 
## Model 1: FGm12 ~ s(FGm0, by = Treatment) + s(height, by = Treatment) + 
##     s(weight, by = Treatment) + te(SysPres, DiaPres, by = Treatment, 
##     k = 4)
## Model 2: FGm12 ~ s(FGm0, by = Treatment) + s(height, by = Treatment) + 
##     weight * Treatment + te(SysPres, DiaPres, by = Treatment, 
##     k = 4)
##   Resid. Df Resid. Dev      Df Deviance      F   Pr(>F)   
## 1     8.221     12.632                                    
## 2    28.236    149.146 -20.015  -136.51 7.3073 0.003172 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

With weight, the worsening effect is much smaller. Let us compare the residuals of gam4.13 and gam4.13.3 to get a better picture of the situation.

gam.check(gam4.13)

## 
## Method: GCV   Optimizer: magic
## Smoothing parameter selection converged after 258 iterations.
## The RMS GCV score gradient at convergence was 0.0008328835 .
## The Hessian was not positive definite.
## Model rank =  169 / 169 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##                                   k'   edf k-index p-value
## s(FGm0):Treatment0              9.00  6.40    1.26    1.00
## s(FGm0):Treatment1              9.00  8.74    1.26    0.97
## s(FGm0):Treatment2              9.00  1.00    1.26    0.99
## s(FGm0):Treatment3              9.00  6.46    1.26    0.99
## s(height):Treatment0            9.00  7.06    0.98    0.39
## s(height):Treatment1            9.00  2.32    0.98    0.35
## s(height):Treatment2            9.00  3.36    0.98    0.33
## s(height):Treatment3            9.00  1.00    0.98    0.34
## s(weight):Treatment0            9.00  1.00    1.12    0.86
## s(weight):Treatment1            9.00  1.86    1.12    0.81
## s(weight):Treatment2            9.00  7.93    1.12    0.85
## s(weight):Treatment3            9.00  4.50    1.12    0.85
## te(SysPres,DiaPres):Treatment0 15.00  6.18    1.09    0.78
## te(SysPres,DiaPres):Treatment1 15.00  4.44    1.16    0.94
## te(SysPres,DiaPres):Treatment2 15.00  6.41    1.17    0.94
## te(SysPres,DiaPres):Treatment3 15.00  7.81    1.18    0.94
gam.check(gam4.13.3)

## 
## Method: GCV   Optimizer: magic
## Smoothing parameter selection converged after 148 iterations by steepest
## descent step failure.
## The RMS GCV score gradient at convergence was 7.198955e-07 .
## The Hessian was positive definite.
## Model rank =  140 / 140 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##                                   k'   edf k-index p-value
## s(FGm0):Treatment0              9.00  5.22    1.26    0.99
## s(FGm0):Treatment1              9.00  7.38    1.26    1.00
## s(FGm0):Treatment2              9.00  1.02    1.26    1.00
## s(FGm0):Treatment3              9.00  5.16    1.26    0.98
## s(height):Treatment0            9.00  6.90    0.93    0.20
## s(height):Treatment1            9.00  1.00    0.93    0.24
## s(height):Treatment2            9.00  2.96    0.93    0.22
## s(height):Treatment3            9.00  1.15    0.93    0.17
## te(SysPres,DiaPres):Treatment0 15.00  6.06    0.96    0.28
## te(SysPres,DiaPres):Treatment1 15.00  3.00    1.02    0.56
## te(SysPres,DiaPres):Treatment2 15.00  5.27    1.09    0.81
## te(SysPres,DiaPres):Treatment3 15.00  5.09    1.08    0.76

The residuals of gam4.13.3 are more normally distributed and homoscedastic than those of gam4.13. Hence, each model has different pros and cons: gam4.13 explains almost all of the variance in the data, but its residuals are not normally distributed nor hoomoscedastic, while gam4.13.3 explains a lot of the variance in the data (but not almost all of it) and its residuals follow the models’ assumptions better. In other words, gam4.13.3 shows a tradeoff between variance explanation and residuals’ quality.

Moreover, gam4.13 might overfit the data, while gam4.13.3 might generalize better, since it is more parsimonious.

We can also compare the fitted linear model with gam4.13.3.

summary(gam4)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## FGm12 ~ Treatment + FGm0
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.6111     3.0551   0.200 0.841925    
## Treatment1   -4.5812     1.4391  -3.183 0.002027 ** 
## Treatment2   -4.4290     1.4430  -3.069 0.002870 ** 
## Treatment3   -3.5497     1.3820  -2.568 0.011942 *  
## FGm0          0.6217     0.1624   3.829 0.000244 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## R-sq.(adj) =  0.179   Deviance explained = 21.6%
## GCV =   23.5  Scale est. = 22.208    n = 91
summary(gam4.13.3)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## FGm12 ~ s(FGm0, by = Treatment) + s(height, by = Treatment) + 
##     weight * Treatment + te(SysPres, DiaPres, by = Treatment, 
##     k = 4)
## 
## Parametric coefficients:
##                    Estimate Std. Error t value Pr(>|t|)  
## (Intercept)        25.52851   33.81411   0.755   0.4557  
## weight              0.17292    0.09130   1.894   0.0671 .
## Treatment1        -34.96403   34.31610  -1.019   0.3157  
## Treatment2        -37.41467   33.98955  -1.101   0.2790  
## Treatment3        -13.58996   34.02578  -0.399   0.6922  
## weight:Treatment1   0.07347    0.12873   0.571   0.5720  
## weight:Treatment2   0.10460    0.10378   1.008   0.3209  
## weight:Treatment3  -0.22138    0.10450  -2.118   0.0418 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                                  edf Ref.df      F  p-value    
## s(FGm0):Treatment0             5.223  5.412  5.468 0.000824 ***
## s(FGm0):Treatment1             7.384  8.078  6.330 5.38e-05 ***
## s(FGm0):Treatment2             1.018  1.031  4.188 0.045268 *  
## s(FGm0):Treatment3             5.156  6.094 10.851 1.43e-06 ***
## s(height):Treatment0           6.904  7.139  5.438 0.000196 ***
## s(height):Treatment1           1.000  1.000  0.827 0.369817    
## s(height):Treatment2           2.955  3.570  6.292 0.001029 ** 
## s(height):Treatment3           1.152  1.254  4.982 0.016408 *  
## te(SysPres,DiaPres):Treatment0 6.063  6.216  3.759 0.005170 ** 
## te(SysPres,DiaPres):Treatment1 3.000  3.000  3.752 0.020101 *  
## te(SysPres,DiaPres):Treatment2 5.274  6.248  6.921 6.54e-05 ***
## te(SysPres,DiaPres):Treatment3 5.092  5.721 10.266 7.84e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.832   Deviance explained = 93.9%
## GCV = 12.632  Scale est. = 4.5502    n = 91
anova(gam4, gam4.13.3, test = "F")
## Analysis of Deviance Table
## 
## Model 1: FGm12 ~ Treatment + FGm0
## Model 2: FGm12 ~ s(FGm0, by = Treatment) + s(height, by = Treatment) + 
##     weight * Treatment + te(SysPres, DiaPres, by = Treatment, 
##     k = 4)
##   Resid. Df Resid. Dev     Df Deviance      F    Pr(>F)    
## 1    86.000    1909.93                                     
## 2    28.236     149.15 57.764   1760.8 6.6992 3.049e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The improvement is clear, since the linear model was very limited. Hence, we finally decide gam4.13.3 as our best model:

FGm12 ~ s(FGm0, by=Treatment) + s(height, by=Treatment) + weight*Treatment + te(SysPres, DiaPres, by=Treatment, k=4)

Let us interpret and visualize gam4.13.3.

Model interpretation and visualization

summary(gam4.13.3)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## FGm12 ~ s(FGm0, by = Treatment) + s(height, by = Treatment) + 
##     weight * Treatment + te(SysPres, DiaPres, by = Treatment, 
##     k = 4)
## 
## Parametric coefficients:
##                    Estimate Std. Error t value Pr(>|t|)  
## (Intercept)        25.52851   33.81411   0.755   0.4557  
## weight              0.17292    0.09130   1.894   0.0671 .
## Treatment1        -34.96403   34.31610  -1.019   0.3157  
## Treatment2        -37.41467   33.98955  -1.101   0.2790  
## Treatment3        -13.58996   34.02578  -0.399   0.6922  
## weight:Treatment1   0.07347    0.12873   0.571   0.5720  
## weight:Treatment2   0.10460    0.10378   1.008   0.3209  
## weight:Treatment3  -0.22138    0.10450  -2.118   0.0418 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                                  edf Ref.df      F  p-value    
## s(FGm0):Treatment0             5.223  5.412  5.468 0.000824 ***
## s(FGm0):Treatment1             7.384  8.078  6.330 5.38e-05 ***
## s(FGm0):Treatment2             1.018  1.031  4.188 0.045268 *  
## s(FGm0):Treatment3             5.156  6.094 10.851 1.43e-06 ***
## s(height):Treatment0           6.904  7.139  5.438 0.000196 ***
## s(height):Treatment1           1.000  1.000  0.827 0.369817    
## s(height):Treatment2           2.955  3.570  6.292 0.001029 ** 
## s(height):Treatment3           1.152  1.254  4.982 0.016408 *  
## te(SysPres,DiaPres):Treatment0 6.063  6.216  3.759 0.005170 ** 
## te(SysPres,DiaPres):Treatment1 3.000  3.000  3.752 0.020101 *  
## te(SysPres,DiaPres):Treatment2 5.274  6.248  6.921 6.54e-05 ***
## te(SysPres,DiaPres):Treatment3 5.092  5.721 10.266 7.84e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.832   Deviance explained = 93.9%
## GCV = 12.632  Scale est. = 4.5502    n = 91

Overall, we have fit a semiparametric GAM to the hirsutism dataset in order to predict the hirsutism level observed on patients after following a certain treatment for a year.

There are 4 different treatments and, according to the model, each affects the final hirsutism level differently based on the baseline height, weight, systolic blood pressure, diastolic blood pressure and hirsutism level. Baseline measures were taken at the start of the clinical trial.

The effects of FGm0 and height are estimated independently with non-parametric estimators. On the other hand, the term corresponding to the weight predictor has been estimated linearly to improve the parsimony and the residuals’ quality of the model. Finally, SysPres and DiaPres form a tensor spline with \(4^2\) knots.

As mentioned before, all terms interact with Treatment, which shows that, depending on the characteristics of the patient, one treatment or another should be prescribed. Nonetheless, treatments 1 and 2 reduce the hirsutism level the most on average. Treatment 3 could be prescribed to heavy people, however.

On average, treatment 0 (only contraceptive) does not show a significant reduction of the hirsutism level after a year of treatment, causing its expected value to be about 26. Treatments 1 and 2 (with antiandrogen), however, eradicate hirsutism level after a year, since their expected value is negative (and hence null). Finally, treatment 3 (also with antiandrogen) is expected to decrease the hirsutism level to about 12 in the same period of time.

Next, we will plot the non-parametric regression terms of the semiparametric GAM to identify what treatment is most suitable to different kinds of patients, as well as to find out about how the baseline characteristics of people affect their final hirsutism level on average.

par(mfrow = c(1, 1))
vis.gam(gam4.13.3, view = c("height", "weight"), plot.type = "persp", theta = 30, phi = 30)

vis.gam(gam4.13.3, view = c("height", "weight"), plot.type = "contour", main = "FGm12")

The height term seems almost constant on average. In addition, tall heavy people will have the smallest average hirsutism levels at the end of their treatment.

In the summary of the model, we can see that the weight coefficient is lowest for treatment 3. Hence, heavy people should undertake treatment 3.

plot(gam4.13.3, residuals = TRUE, shade = TRUE, seWithMean = TRUE, pages = 4)

s(FGm0):Treatment0 and s(height):Treatment0 have high standard errors at the boundaries and it is difficult to observe their fitted non-parametric regressions. This is probably caused by the lack of short or tall patients with high FGm0 (not necessarily both) that undertook treatment 0.

Let us look at the the smooth terms of treatments 1, 2 and 3.

# Plot smooth terms one by one, skipping Treatment0
par(mfrow = c(2, 3))
for (i in c(2, 3, 4, 6, 7, 8)) {
  plot(gam4.13.3,
    residuals = TRUE, shade = TRUE, seWithMean = TRUE, select = i,
    ylim = c(-20, 20)
  )
}

par(mfrow = c(1, 1))

Now, let us look at the same terms for treatment 0.

par(mfrow = c(1, 2))
# FGm0
plot(gam4.13.3,
  residuals = TRUE, shade = TRUE, seWithMean = TRUE, select = 1,
  ylim = c(-100, 200)
)
# height
plot(gam4.13.3,
  residuals = TRUE, shade = TRUE, seWithMean = TRUE, select = 5,
  ylim = c(-500, 200)
)

par(mfrow = c(1, 1))

Let us zoom at the y-range with the biggest amount of patients.

par(mfrow = c(1, 2))
# FGm0
plot(gam4.13.3,
  residuals = TRUE, shade = TRUE, seWithMean = TRUE, select = 1,
  ylim = c(-100, 100)
)
# height
plot(gam4.13.3,
  residuals = TRUE, shade = TRUE, seWithMean = TRUE, select = 5,
  ylim = c(-100, 100)
)

par(mfrow = c(1, 1))

Even though more data would be needed, we can approximately state that treatment 0 could be appropriate for tall people with a low baseline hirsutism level. However, treatment 2, which is more effective on average than treatment 0, would also be suitable for those patients. Consequently, other characteristics of the patient should be considered to opt for treatment 0.

par(mfrow = c(1, 1))
vis.gam(gam4.13.3, view = c("SysPres", "DiaPres"), plot.type = "persp", theta = 30, phi = 30)

vis.gam(gam4.13.3, view = c("SysPres", "DiaPres"), plot.type = "contour", main = "FGm12")

The average hirsutism level after 1 year of treatment is lowest for patients with high diastolic blood pressure and low systolic pressure simultaneously or low diastolic blood pressure and high systolic blood pressure simultaneously as well. We will now check what treatment is most suitable according to the baseline systolic and diastolic blood pressures of patients.

# Treatment 0
plot(gam4.13.3, residuals = TRUE, shade = TRUE, seWithMean = TRUE, select = 9)

# Treatment 1
plot(gam4.13.3, residuals = TRUE, shade = TRUE, seWithMean = TRUE, select = 10)

# Treatment 2
plot(gam4.13.3, residuals = TRUE, shade = TRUE, seWithMean = TRUE, select = 11)

# Treatment 3
plot(gam4.13.3, residuals = TRUE, shade = TRUE, seWithMean = TRUE, select = 12)

To conclude, the semiparametric GAM we have fitted has allowed us not only to predict the hirsutism level after the treatment of a patient, but also to recommend him/her/them the most appropriate treatment according to his/her/their hirsutism level, height, weight and blood pressure. Moreover, our model explains 93.9% of the deviance, has an adjusted \(R^2\) of 0.832 and its residuals are normally distributed homoscedastic, making it correct, precise and able to generalize to new observations. Nonetheless, more short or tall patients with high FGm0 (not necessarily both) that undertook treatment 0 would have allowed us to better understand the base treatment without antiandrogen.